Dataset statistics
| Number of variables | 6 |
|---|---|
| Number of observations | 843 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 39.6 KiB |
| Average record size in memory | 48.2 B |
Variable types
| Numeric | 6 |
|---|
Houses - median sale price ($) is highly correlated with INCOME_36 and 3 other fields | High correlation |
INCOME_36 is highly correlated with Houses - median sale price ($) and 3 other fields | High correlation |
INCOME_21 is highly correlated with Houses - median sale price ($) and 1 other fields | High correlation |
INCOME_27 is highly correlated with Houses - median sale price ($) and 2 other fields | High correlation |
INCOME_24 is highly correlated with Houses - median sale price ($) and 2 other fields | High correlation |
Houses - median sale price ($) is highly correlated with INCOME_36 and 2 other fields | High correlation |
INCOME_36 is highly correlated with Houses - median sale price ($) and 2 other fields | High correlation |
INCOME_21 is highly correlated with Houses - median sale price ($) and 1 other fields | High correlation |
INCOME_24 is highly correlated with Houses - median sale price ($) and 1 other fields | High correlation |
INCOME_36 is highly correlated with INCOME_21 | High correlation |
INCOME_21 is highly correlated with INCOME_36 | High correlation |
INCOME_36 is highly correlated with Houses - median sale price ($) and 4 other fields | High correlation |
Houses - median sale price ($) is highly correlated with INCOME_36 and 4 other fields | High correlation |
INCOME_21 is highly correlated with INCOME_36 and 4 other fields | High correlation |
INCOME_27 is highly correlated with INCOME_36 and 4 other fields | High correlation |
INCOME_24 is highly correlated with INCOME_36 and 4 other fields | High correlation |
INCOME_30 is highly correlated with INCOME_36 and 4 other fields | High correlation |
Reproduction
| Analysis started | 2021-08-18 03:37:38.033403 |
|---|---|
| Analysis finished | 2021-08-18 03:37:46.299675 |
| Duration | 8.27 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 496 |
|---|---|
| Distinct (%) | 58.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.865771414 × 10-16 |
| Minimum | -1.389170896 |
|---|---|
| Maximum | 8.66419486 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 491 |
| Negative (%) | 58.2% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -1.389170896 |
|---|---|
| 5-th percentile | -0.9824296019 |
| Q1 | -0.5965537842 |
| median | -0.1833089654 |
| Q3 | 0.1791270642 |
| 95-th percentile | 1.737940717 |
| Maximum | 8.66419486 |
| Range | 10.05336576 |
| Interquartile range (IQR) | 0.7756808484 |
Descriptive statistics
| Standard deviation | 1.000593648 |
|---|---|
| Coefficient of variation (CV) | 3.491533356 × 1015 |
| Kurtosis | 15.56378546 |
| Mean | 2.865771414 × 10-16 |
| Median Absolute Deviation (MAD) | 0.3996958083 |
| Skewness | 3.088187381 |
| Sum | 2.415845302 × 10-13 |
| Variance | 1.001187648 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3.154624824 × 10-16 | 78 | 9.3% |
| 0.06057322275 | 10 | 1.2% |
| 0.1147692646 | 8 | 0.9% |
| -0.5491322476 | 8 | 0.9% |
| -0.4949362058 | 7 | 0.8% |
| -0.6168772999 | 7 | 0.8% |
| -0.8878575089 | 6 | 0.7% |
| -0.4813871953 | 6 | 0.7% |
| -0.4136421431 | 6 | 0.7% |
| -0.2104069863 | 6 | 0.7% |
| Other values (486) | 701 |
| Value | Count | Frequency (%) |
| -1.389170896 | 1 | |
| -1.316006239 | 1 | |
| -1.240131781 | 1 | |
| -1.199484749 | 1 | |
| -1.192710244 | 1 | |
| -1.185935739 | 1 | |
| -1.172386728 | 2 | |
| -1.169676926 | 1 | |
| -1.164257322 | 1 | |
| -1.158837718 | 1 |
| Value | Count | Frequency (%) |
| 8.66419486 | 1 | |
| 7.729313139 | 1 | |
| 5.534373445 | 1 | |
| 5.033060059 | 1 | |
| 4.856922923 | 1 | |
| 4.726852423 | 1 | |
| 4.363738942 | 1 | |
| 4.214699827 | 1 | |
| 4.077854822 | 1 | |
| 4.057531306 | 1 |
| Distinct | 701 |
|---|---|
| Distinct (%) | 83.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.68574789 × 10-17 |
| Minimum | -1.99373369 |
|---|---|
| Maximum | 7.378438997 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 558 |
| Negative (%) | 66.2% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -1.99373369 |
|---|---|
| 5-th percentile | -1.212694667 |
| Q1 | -0.5910079861 |
| median | -0.0005273952406 |
| Q3 | 0.303651045 |
| 95-th percentile | 1.928929517 |
| Maximum | 7.378438997 |
| Range | 9.372172688 |
| Interquartile range (IQR) | 0.8946590312 |
Descriptive statistics
| Standard deviation | 1.000593648 |
|---|---|
| Coefficient of variation (CV) | 5.935606705 × 1016 |
| Kurtosis | 10.82464248 |
| Mean | 1.68574789 × 10-17 |
| Median Absolute Deviation (MAD) | 0.4660503543 |
| Skewness | 2.377277901 |
| Sum | 1.421085472 × 10-14 |
| Variance | 1.001187648 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -5.004564049 × 10-18 | 136 | 16.1% |
| -0.4106967497 | 2 | 0.2% |
| -0.225670114 | 2 | 0.2% |
| -0.2505139338 | 2 | 0.2% |
| -1.365529905 | 2 | 0.2% |
| -0.4654516841 | 2 | 0.2% |
| -1.078383206 | 2 | 0.2% |
| 0.8907534395 | 2 | 0.2% |
| -0.590163437 | 1 | 0.1% |
| -0.7401412844 | 1 | 0.1% |
| Other values (691) | 691 |
| Value | Count | Frequency (%) |
| -1.99373369 | 1 | |
| -1.615234928 | 1 | |
| -1.505302784 | 1 | |
| -1.489256351 | 1 | |
| -1.482781474 | 1 | |
| -1.480247827 | 1 | |
| -1.465749734 | 1 | |
| -1.448929131 | 1 | |
| -1.420144082 | 1 | |
| -1.417891951 | 1 |
| Value | Count | Frequency (%) |
| 7.378438997 | 1 | |
| 7.036466984 | 1 | |
| 6.163414334 | 1 | |
| 5.70982108 | 1 | |
| 5.455963691 | 1 | |
| 4.141352611 | 1 | |
| 3.711477111 | 1 | |
| 3.661156059 | 1 | |
| 3.600700418 | 1 | |
| 3.351417671 | 1 |
| Distinct | 777 |
|---|---|
| Distinct (%) | 92.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -2.191472257 × 10-16 |
| Minimum | -2.573652203 |
|---|---|
| Maximum | 5.857210978 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 479 |
| Negative (%) | 56.8% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -2.573652203 |
|---|---|
| 5-th percentile | -1.49310825 |
| Q1 | -0.6217987022 |
| median | -0.005499171865 |
| Q3 | 0.4567254759 |
| 95-th percentile | 1.756835074 |
| Maximum | 5.857210978 |
| Range | 8.430863181 |
| Interquartile range (IQR) | 1.078524178 |
Descriptive statistics
| Standard deviation | 1.000593648 |
|---|---|
| Coefficient of variation (CV) | -4.565851311 × 1015 |
| Kurtosis | 2.720552367 |
| Mean | -2.191472257 × 10-16 |
| Median Absolute Deviation (MAD) | 0.543551877 |
| Skewness | 0.9773246939 |
| Sum | -1.847411113 × 10-13 |
| Variance | 1.001187648 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -2.396922782 × 10-16 | 52 | 6.2% |
| -0.2505533065 | 2 | 0.2% |
| -0.04678838054 | 2 | 0.2% |
| -0.5495872724 | 2 | 0.2% |
| -0.6185813613 | 2 | 0.2% |
| -0.5778283761 | 2 | 0.2% |
| 0.1355276058 | 2 | 0.2% |
| -0.5270658858 | 2 | 0.2% |
| -1.050598753 | 2 | 0.2% |
| -0.015508677 | 2 | 0.2% |
| Other values (767) | 773 |
| Value | Count | Frequency (%) |
| -2.573652203 | 1 | |
| -2.361307702 | 1 | |
| -2.204551702 | 1 | |
| -2.194899679 | 1 | |
| -2.159330187 | 1 | |
| -2.093017216 | 1 | |
| -2.092302251 | 1 | |
| -2.080147852 | 1 | |
| -1.938048627 | 1 | |
| -1.921246958 | 1 |
| Value | Count | Frequency (%) |
| 5.857210978 | 1 | |
| 4.328259069 | 1 | |
| 3.81366326 | 1 | |
| 3.792929285 | 1 | |
| 3.771480346 | 1 | |
| 3.671385294 | 1 | |
| 3.372887552 | 1 | |
| 3.336603096 | 1 | |
| 3.212377987 | 1 | |
| 3.193788907 | 1 |
| Distinct | 734 |
|---|---|
| Distinct (%) | 87.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -9.861625158 × 10-16 |
| Minimum | -2.09110046 |
|---|---|
| Maximum | 4.667927675 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 525 |
| Negative (%) | 62.3% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -2.09110046 |
|---|---|
| 5-th percentile | -1.370658788 |
| Q1 | -0.6500172727 |
| median | -0.02854987672 |
| Q3 | 0.4760964198 |
| 95-th percentile | 1.752144281 |
| Maximum | 4.667927675 |
| Range | 6.759028135 |
| Interquartile range (IQR) | 1.126113692 |
Descriptive statistics
| Standard deviation | 1.000593648 |
|---|---|
| Coefficient of variation (CV) | -1.014633625 × 1015 |
| Kurtosis | 2.777344673 |
| Mean | -9.861625158 × 10-16 |
| Median Absolute Deviation (MAD) | 0.5723321548 |
| Skewness | 1.213443381 |
| Sum | -8.313350008 × 10-13 |
| Variance | 1.001187648 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -1.264388796 × 10-15 | 99 | 11.7% |
| -1.005520083 | 2 | 0.2% |
| -0.665526805 | 2 | 0.2% |
| -0.4940965118 | 2 | 0.2% |
| 0.5540350782 | 2 | 0.2% |
| 0.2327227496 | 2 | 0.2% |
| 0.3161353606 | 2 | 0.2% |
| -0.3946095956 | 2 | 0.2% |
| 0.6496989164 | 2 | 0.2% |
| -0.08902401969 | 2 | 0.2% |
| Other values (724) | 726 |
| Value | Count | Frequency (%) |
| -2.09110046 | 1 | |
| -1.987442913 | 1 | |
| -1.738421514 | 1 | |
| -1.684898422 | 1 | |
| -1.661525513 | 1 | |
| -1.661264849 | 1 | |
| -1.647015194 | 1 | |
| -1.623120957 | 1 | |
| -1.608697526 | 1 | |
| -1.59688074 | 1 |
| Value | Count | Frequency (%) |
| 4.667927675 | 1 | |
| 4.369814479 | 1 | |
| 4.314292959 | 1 | |
| 4.213328945 | 1 | |
| 4.170319317 | 1 | |
| 4.12635392 | 1 | |
| 3.985768915 | 1 | |
| 3.756731788 | 1 | |
| 3.412654767 | 1 | |
| 3.341840936 | 1 |
| Distinct | 764 |
|---|---|
| Distinct (%) | 90.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -1.264310918 × 10-16 |
| Minimum | -1.392692792 |
|---|---|
| Maximum | 10.08686681 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 552 |
| Negative (%) | 65.5% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -1.392692792 |
|---|---|
| 5-th percentile | -0.9595430277 |
| Q1 | -0.5853927909 |
| median | -0.1674990894 |
| Q3 | 0.1851011737 |
| 95-th percentile | 1.7085101 |
| Maximum | 10.08686681 |
| Range | 11.4795596 |
| Interquartile range (IQR) | 0.7704939645 |
Descriptive statistics
| Standard deviation | 1.000593648 |
|---|---|
| Coefficient of variation (CV) | -7.914142273 × 1015 |
| Kurtosis | 21.17458657 |
| Mean | -1.264310918 × 10-16 |
| Median Absolute Deviation (MAD) | 0.4071422579 |
| Skewness | 3.481790033 |
| Sum | -1.065814104 × 10-13 |
| Variance | 1.001187648 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -1.427500873 × 10-16 | 44 | 5.2% |
| -0.0984387218 | 3 | 0.4% |
| -0.416273368 | 2 | 0.2% |
| -0.2348329477 | 2 | 0.2% |
| -0.3125258613 | 2 | 0.2% |
| -0.6937704813 | 2 | 0.2% |
| -0.3447017143 | 2 | 0.2% |
| -0.5746413473 | 2 | 0.2% |
| -0.6204723185 | 2 | 0.2% |
| 0.1599098351 | 2 | 0.2% |
| Other values (754) | 780 |
| Value | Count | Frequency (%) |
| -1.392692792 | 1 | |
| -1.277173632 | 1 | |
| -1.232127438 | 1 | |
| -1.200893317 | 1 | |
| -1.185197779 | 1 | |
| -1.171542661 | 1 | |
| -1.153178881 | 1 | |
| -1.146115889 | 1 | |
| -1.115038724 | 1 | |
| -1.112684393 | 1 |
| Value | Count | Frequency (%) |
| 10.08686681 | 1 | |
| 7.095768114 | 1 | |
| 6.552231631 | 1 | |
| 6.01842638 | 1 | |
| 5.363608531 | 1 | |
| 5.071200657 | 1 | |
| 4.657623228 | 1 | |
| 3.681203804 | 1 | |
| 3.63207677 | 1 | |
| 3.600057872 | 1 |
| Distinct | 785 |
|---|---|
| Distinct (%) | 93.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -4.425088212 × 10-16 |
| Minimum | -2.921626255 |
|---|---|
| Maximum | 12.29622818 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 574 |
| Negative (%) | 68.1% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -2.921626255 |
|---|---|
| 5-th percentile | -0.9774474735 |
| Q1 | -0.450091947 |
| median | -0.1761817045 |
| Q3 | 0.1372121406 |
| 95-th percentile | 1.540036312 |
| Maximum | 12.29622818 |
| Range | 15.21785444 |
| Interquartile range (IQR) | 0.5873040875 |
Descriptive statistics
| Standard deviation | 1.000593648 |
|---|---|
| Coefficient of variation (CV) | -2.261183507 × 1015 |
| Kurtosis | 35.24980894 |
| Mean | -4.425088212 × 10-16 |
| Median Absolute Deviation (MAD) | 0.2888761134 |
| Skewness | 4.272803734 |
| Sum | -3.730349363 × 10-13 |
| Variance | 1.001187648 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -5.627444061 × 10-16 | 36 | 4.3% |
| 0.0009337705959 | 2 | 0.2% |
| -0.2456357206 | 2 | 0.2% |
| -0.04114082263 | 2 | 0.2% |
| -1.147223098 | 2 | 0.2% |
| -0.1552217509 | 2 | 0.2% |
| -0.4476556424 | 2 | 0.2% |
| -0.3509768896 | 2 | 0.2% |
| 0.03937324271 | 2 | 0.2% |
| -0.6799166782 | 2 | 0.2% |
| Other values (775) | 789 |
| Value | Count | Frequency (%) |
| -2.921626255 | 1 | |
| -2.34031625 | 1 | |
| -1.838050794 | 1 | |
| -1.800152723 | 1 | |
| -1.721108174 | 1 | |
| -1.720721459 | 1 | |
| -1.682900731 | 1 | |
| -1.60223198 | 1 | |
| -1.508646947 | 1 | |
| -1.473301195 | 1 |
| Value | Count | Frequency (%) |
| 12.29622818 | 1 | |
| 6.937672956 | 1 | |
| 6.022782582 | 1 | |
| 5.619902884 | 1 | |
| 5.177578254 | 1 | |
| 5.012373601 | 1 | |
| 4.486054471 | 1 | |
| 4.431295625 | 1 | |
| 4.288133728 | 1 | |
| 3.954166644 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Houses - median sale price ($) | INCOME_36 | INCOME_30 | INCOME_21 | INCOME_27 | INCOME_24 | |
|---|---|---|---|---|---|---|
| 0 | 3.154625e-16 | -5.273952e-04 | 0.693379 | 1.689469e-01 | -6.159206e-01 | -0.346878 |
| 1 | 2.638084e-01 | -5.004564e-18 | 2.031971 | 1.201004e+00 | 2.441949e-01 | -0.679917 |
| 2 | -4.813872e-01 | -1.292054e+00 | 0.488184 | -1.647015e+00 | -9.843872e-02 | -0.222974 |
| 3 | -5.288087e-01 | -1.229206e+00 | 1.133797 | -1.202495e+00 | -1.427501e-16 | -0.908079 |
| 4 | -4.136421e-01 | -8.540850e-01 | 1.165970 | -1.112306e+00 | -1.987332e-01 | 0.590829 |
| 5 | -4.312559e-01 | -1.094007e+00 | 0.937182 | -1.264389e-15 | -1.131925e-01 | -0.209594 |
| 6 | -3.052501e-01 | -1.031581e+00 | 0.696596 | -1.099881e+00 | 9.555813e-02 | -0.206113 |
| 7 | 3.154625e-16 | -4.973334e-01 | -0.293809 | -5.356290e-01 | -4.181568e-01 | -0.213229 |
| 8 | 3.154625e-16 | -2.086384e-01 | -0.118642 | -1.693955e-01 | -2.396986e-01 | 0.337066 |
| 9 | 3.154625e-16 | -2.874630e-01 | -0.719749 | -2.254384e-01 | -4.958497e-01 | -0.098065 |
Last rows
| Houses - median sale price ($) | INCOME_36 | INCOME_30 | INCOME_21 | INCOME_27 | INCOME_24 | |
|---|---|---|---|---|---|---|
| 833 | 0.071412 | 1.022433e+00 | 3.092979 | -1.264389e-15 | -0.578408 | -3.856266e-01 |
| 834 | -0.007172 | 4.073898e-01 | 1.712025 | 8.414610e-01 | -0.568677 | -5.627444e-16 |
| 835 | -0.312025 | -6.830246e-02 | 0.531797 | 2.088947e-02 | -1.030754 | 9.451880e-02 |
| 836 | 0.602534 | 1.963894e+00 | 3.372888 | 2.229325e+00 | 0.725106 | 1.698303e+00 |
| 837 | 0.040250 | 9.886508e-01 | 2.296687 | 1.424394e+00 | -0.425848 | -3.552308e-01 |
| 838 | -0.061368 | 6.337994e-01 | 2.339942 | 1.112986e+00 | -0.623141 | -2.776557e-01 |
| 839 | -0.115564 | 3.566465e-01 | 1.778338 | 8.516269e-01 | -0.755297 | -2.012408e-01 |
| 840 | 0.494142 | 1.155520e+00 | 2.541562 | -1.264389e-15 | 0.220181 | 1.333786e+00 |
| 841 | 0.345102 | -5.004564e-18 | 2.317600 | 9.330411e-01 | 0.105603 | 7.374712e-01 |
| 842 | 0.166256 | -5.004564e-18 | 2.295078 | 1.252355e+00 | -0.234833 | 4.486724e-01 |